Parallel Programming and Its Architectures Based on Data Access Separated Algorithm Kernels

نویسندگان

Dake Liu

Joar Sohl

Jian Wang

چکیده

A novel master-multi-SIMD architecture and its kernel (template) based parallel programming flow is introduced as a parallel signal processing platform. The name of the platform is ePUMA (embedded Parallel DSP processor architecture with Unique Memory Access). The essential technology is to separate data accessing kernels from arithmetic computing kernels so that the run-time cost of data access can be minimized by running it in parallel with algorithm computing. The SIMD memory subsystem architecture based on the proposed flow dramatically improves the total computing performance. The hardware system and programming flow introduced in this article will primarily aim at low-power high-performance embedded parallel computing with low silicon cost for communications and similar real-time signal processing. DOI: 10.4018/978-1-61350-456-7.ch2.7

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs

Spatial blocking is a critical memory-access optimization to efficiently exploit the computing resources of parallel processors, such as many-core GPUs. By reusing cache-loaded data over multiple spatial iterations, spatial blocking can significantly lessen the pressure of accessing slow global memory. Stencil computations, for example, can exploit such data reuse via spatial blocking through t...

متن کامل

Pareto-based Multi-criteria Evolutionary Algorithm for Parallel Machines Scheduling Problem with Sequence-dependent Setup Times

This paper addresses an unrelated multi-machine scheduling problem with sequence-dependent setup time, release date and processing set restriction to minimize the sum of weighted earliness/tardiness penalties and the sum of completion times, which is known to be NP-hard. A Mixed Integer Programming (MIP) model is proposed to formulate the considered multi-criteria problem. Also, to solve the mo...

متن کامل

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolvingmanycore architectures. High performance computing (HPC) applications and librariesmust exploit increasingly finer levels of parallelismwithin their codes to sustain scalability on these devices. A major obstacle to performance portability is the diverse a...

متن کامل

High performance combinatorial algorithm design on the Cell Broadband Engine processor

The Sony–Toshiba–IBM Cell Broadband Engine (Cell/B.E.) is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE) with eight SIMD co-processing units (SPEs) integrated on-chip. While the Cell/B.E. processor is architected for multimedia applications with regular processing requirements, we are interested in its performance on problems with non-uniform memory a...

متن کامل

Solving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs

Nowadays, by successful application of on time production concept in other concepts like production management and storage, the need to complete the processing of jobs in their delivery time is considered a key issue in industrial environments. Unrelated parallel machines scheduling is a general mood of classic problems of parallel machines. In some of the applications of unrelated parallel mac...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IJERTCS

دوره 1 شماره

صفحات -

تاریخ انتشار 2010

Parallel Programming and Its Architectures Based on Data Access Separated Algorithm Kernels

نویسندگان

چکیده

منابع مشابه

GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs

Pareto-based Multi-criteria Evolutionary Algorithm for Parallel Machines Scheduling Problem with Sequence-dependent Setup Times

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

High performance combinatorial algorithm design on the Cell Broadband Engine processor

Solving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs

عنوان ژورنال:

اشتراک گذاری